Goto

Collaborating Authors

 daft mac




drawing connections to Feldman's work (L36), but we agree that the relation between the three topics should be

Neural Information Processing Systems

Thank you all for your thoughtful comments; we address your concerns below. The MDL principle formalizes Occam's razor and is a We will add the discussion of such relevant studies to section 1. We will add these results and accompanying visualizations to appendix. Model (solver) MAC DAFT MAC (euler) DAFT MAC (rk4) DAFT MAC (dopri5; used in training)Time (ms) 153. We found that during evaluation, rk4 solves all the dynamics generated from CLEVR dataset.


Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning

Kim, Wonjae, Lee, Yoonho

arXiv.org Machine Learning

Without relevant human priors, neural networks may learn uninterpretable features. We propose Dynamics of Attention for Focus Transition (DAFT) as a human prior for machine reasoning. DAFT is a novel method that regularizes attention-based reasoning by modelling it as a continuous dynamical system using neural ordinary differential equations. As a proof of concept, we augment a state-of-the-art visual reasoning model with DAFT. Our experiments reveal that applying DAFT yields similar performance to the original model while using fewer reasoning steps, showing that it implicitly learns to skip unnecessary steps. We also propose a new metric, Total Length of Transition (TLT), which represents the effective reasoning step size by quantifying how much a given model's focus drifts while reasoning about a question. We show that adding DAFT results in lower TLT, demonstrating that our method indeed obeys the human prior towards shorter reasoning paths in addition to producing more interpretable attention maps.